Automatic Declassification of Textual Documents by Generalizing Sensitive Terms

نویسندگان

Veena Vasudevan

Ansamma John

F. Baiardi

A. Falleni

R. Granchi

F. Martinelli

M. Petrocchi

A. Vaccarelli

چکیده

With the advent of internet, large numbers of text documents are published and shared every day . Each of these documents is a collection of vast amount of information. Publically sharing of some of this information may affect the privacy of the document, if they are confidential information. So before document publishing, sanitization operations are performed on the document for preserving the privacy and inorder to retain the utility of the document. Various schemes were developed to solve this problem but most of them turned out to be domain specific and most of them didn't consider the presence of semantically correlated terms. This paper presents a generalized sanitization method that discovers the sensitive information based on the concept of information content. The proposed method removes the confidential information from the text document by first finding the independent sensitive terms. Then with the use of these sensitive terms the correlated terms that cause a disclosure threat are discovered. Again with the help of a generalization algorithm these sensitive and correlated terms with high disclosure risk are generalized.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Utility-preserving sanitization of semantically correlated terms in textual documents

Traditionally, redaction has been the method chosen to mitigate the privacy issues related to the declassification of textual documents containing sensitive data. This process is based on removing sensitive words in the documents prior to their release and has the undesired side effect of severely reducing the utility of the content. Document sanitization is a recent alternative to redaction, w...

متن کامل

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Toward a Framework for the Large Scale Textual and Contextual Analysis of Government Information Declassification Patterns

The US government protects a massive amount of secret data as part of its Security Classification System. This information is expensive to protect and maintain. In order to keep citizens informed, as well as to keep costs down, the government is constantly releasing newly declassified documents to the public. According to OpenTheGovernment.org’s annual Secrecy Report Card, human readers manuall...

متن کامل

Detecting Term Relationships to Improve Textual Document Sanitization

Nowadays, the publication of textual documents provides critical benefits to scientific research and business scenarios where information analysis plays an essential role. Nevertheless, the possible existence of identifying or confidential data in this kind of documents motivates the use of measures to sanitize sensitive information before being published, while keeping the innocuous data unmod...

متن کامل

Utility-preserving privacy protection of textual healthcare documents

The adoption of ITs by medical organisations makes possible the compilation of large amounts of healthcare data, which are quite often needed to be released to third parties for research or business purposes. Many of this data are of sensitive nature, because they may include patient-related documents such as electronic healthcare records. In order to protect the privacy of individuals, several...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Automatic Declassification of Textual Documents by Generalizing Sensitive Terms

نویسندگان

چکیده

منابع مشابه

Utility-preserving sanitization of semantically correlated terms in textual documents

A survey on Automatic Text Summarization

Toward a Framework for the Large Scale Textual and Contextual Analysis of Government Information Declassification Patterns

Detecting Term Relationships to Improve Textual Document Sanitization

Utility-preserving privacy protection of textual healthcare documents

عنوان ژورنال:

اشتراک گذاری